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Abstract 


This study explores the relationship between evaluation policies and evaluation practice. Through 
document analysis, interviews, and a multiple case study, the research examined the explicit and 
implicit policies overarching the evaluation work commissioned by the Robert Wood Johnson 
Foundation (RWJF) and explored how these policies are implemented in the field. This examination 
of evaluation policies at RWVJF has pointed out some significant strengths, including emphasis on the 
importance of evaluation; collegiality in defining, formulating, and monitoring evaluations; using a 
variety of evaluation products to communicate results; and use of evaluation advisory committees to 
strengthen evaluation approaches. However, these policies have evolved somewhat haphazardly 
over time. Consequently, some written policies are absent or inadequate and some policies are 
followed with less consistency than others. The findings point to the importance of a comprehensive 
and integrated set of evaluation policies grounded in intended outcomes and the need for additional 
studies on this topic. 
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Nonprofits have made strides to become more systematic in describing the effects of what they do, 
but the public still has very little reliable information that speaks to the results of the programs and 
interventions (Flynn & Hodgkinson, 2001; Liket, Rey-Garcia, & Maas, 2014). Specifically, founda- 
tions are criticized—largely by trustees and executives—for the lack of useful evaluation results 
(Behrens & Kelly, 2008). These criticisms are partially magnified because of the role foundations 
play as brokers of funds to social services providers. Porter and Kramer (1999) argued that to 
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compensate for their position as expensive “middlemen,” foundations should create benefit in ways 
that extend beyond their purchasing power by improving, among other things, the performance of 
grant recipients and the state of knowledge and practice. 

To accomplish these things, foundations need reliable information about how and why the 
programs that they fund work to effect change—information gathered through evaluations. Evalua- 
tion policies concerning preferred methods, levels of stakeholder involvement, available resources, 
and information about project management will affect the type and quality of data received 
(Trochim, 2009). Such policies might be by design or unintentional; they may be explicitly articu- 
lated or implicitly understood. Even when invisible, evaluation policies both “enable and constrain 
the potential contributions evaluations can make” (Mark, Cooksey, & Trochim, 2009, p. 3). They 
can communicate what an organization values and influence the level of resources devoted to 
evaluation efforts, theoretical predispositions toward evaluation, and how evaluation findings are 
used (Trochim, 2009). 

Despite these assertions that evaluation policy is a critical issue facing the field, there have been 
limited empirical investigations of these issues to date. The present study provides an in-depth, field- 
based exploration of evaluation policies, including their content and implementation. Through a 
qualitative study that included record analysis, interviews, and case studies, this study explored the 
evaluation policies of a particular nonprofit organization to examine how they are implemented in 
real-world settings. The study was guided by the following research questions: 


1. What are the evaluation policies of the Robert Wood Johnson Foundation (RWJF)? How are 
they developed and communicated to key stakeholders? How are they interpreted? 

2. What is the extent to which evaluations are implemented as described by the Foundation’s 
policies? What are the barriers to and supports for successful implementation? 


Evaluation Policy Context 


Evaluation policy is an important matter facing the field of evaluation today (Trochim, 2009). It 
encompasses all other aspects of evaluation practice—topics of debate among practitioners and 
scholars including methods, stakeholders, use, dissemination, and so on. Moreover, evaluation 
policies have direct effects on what social programs are funded and their perceived value (Datta, 
2009). Regardless of how intentionally policies are articulated, every organization that engages in 
evaluation has evaluation policies, and thus all evaluations are in some way influenced by those 
policies (Trochim, 2009). But, all too often, too little attention is expressly paid to developing the 
policies that guide evaluation efforts (Mark et al., 2009). 

Evaluation policy has been defined as ‘“‘any rule or principle that a group or organization uses to 
guide its decisions and actions when doing evaluation” (Trochim, 2009, p. 16). There has been some 
consideration in the literature of the specific components that might make up an evaluation policy. 
Mark, Cooksey, and Trochim (2009) noted that methods are the most obvious candidate for elabora- 
tion, but evaluation policy can describe myriad evaluation features. The American Evaluation 
Association (AEA)’s Evaluation Policy Task Force outlined seven areas of evaluation policy: 
definitions, requirements, methods, human resources, budgets, implementation, and ethics (AEA, 
2007). Trochim’s (2009) evaluation policy framework includes eight complementary, yet slightly 
different elements. Specifically, he noted that evaluation policy can describe (1) desired outcomes; 
(2) who is involved in evaluations, when, and under what circumstances; (3) the intention of the 
organization as building capacity to conduct evaluations; (4) the allocation of resources (financial 
and otherwise) and oversight; (5) varying responsibilities of participants; (6) procedures; (7) how 
results will be communicated and acted upon; and (8) how organizations might periodically assess 
the quality of their evaluations. 
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These delineations provide useful categories for thinking about the different facets of evaluation 
policy and practice, but they fail to account for how these components might interact with and 
influence one another. Using a “systems lens” facilitates broader consideration of how evaluation 
policies might be constructed and leveraged to inform evaluation practice. Meadows (2008) 
describes a system as an “interconnected set of elements that is coherently organized in a way that 
achieves something” (p. 11). This definition captures two important ideas about evaluation policy: 
(1) It notes the interconnectedness of the various elements and (2) it surfaces the idea that these 
elements work together in the aim of accomplishing something. In short, a systems approach to 
evaluation policies encourages a consideration of why organizations prefer and prioritize certain 
evaluation practices. Constructing evaluation policies as a system geared toward reaching a partic- 
ular outcomes allows for a deeper examination of how evaluation policies influence evaluation 
practice, and ultimately, how organizations might more deliberately develop evaluation policies. 

While the field of evaluation has engaged in considerable speculation about the importance of 
evaluation policy as it relates to evaluation practice, few studies with empirical evidence exist to 
support these ideas. Importantly, the AEA (2007) and Trochim (2009) frameworks were both 
generated from a theoretical rather than empirical standpoint. While some studies have broadly 
examined the influence of evaluation policies on evaluation practice (e.g., Christie & Fierro, 2012; 
Summa & Toulemonde, 2002), to date, there has not been an intensive study about how evaluation 
policies are operationalized in an organization. This research investigated evaluation policy as it is 
described and enacted in situ. Specifically, this study examined the creation and enactment of 
evaluation policy at an organization that has made a strong commitment to evaluation and has 
developed and articulated an evaluation policy. The research provides an empirical compliment 
to the theoretical work that has been done thus far about evaluation policy. 

This study sheds light on the ways in which evaluation policies interact with one another and 
relate to evaluation activities. Furthermore, the research explores the ways in which evaluation 
policies contribute to organizational learning and the practice of evaluation more generally. Finally, 
the article offers recommendations supporting the development of evaluation policies as a system 
rather than a set of discreet tasks and rules. 


Research Method 
Study Setting 


This study was conducted at the RWJF, which has been in operation for more than 40 years, and is 
the largest philanthropy dedicated solely to the nation’s health. RWJF has long been committed to 
evaluation efforts and is considered a leader among its peer organizations (Hall, 2004). The Founda- 
tion dedicates approximately 20% of its granted dollars to research and evaluation efforts. The 
Foundation’s tasks are divided into departments, some of which primarily function along program 
lines (e.g., research and evaluation and communications) and others that provide a broader base of 
general support (e.g., human resources and information technology). The primary unit of focus for 
the current study is the Research and Evaluation (R&E) department.’ 

RWAJF supported this study partially due to its alignment with The Foundation’s organizational 
philosophy and orientation toward learning. The Foundation and the Research Evaluation, and 
Learning leaders in particular, have an interest in how results from this study might inform and 
shape future evaluation practice. Furthermore, RWJF has a long-standing tradition of sharing 
and learning from evaluation results regardless of whether or not they reflect positively on the 
organization’s funded programs. The present study was no exception. The leadership of RWJF 
offered valuable feedback on the study design and assisted by providing all requested documentation 
and making introductions and scheduling interviews with study participants. 
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Phase I: Policy Identification Through Records Analysis and Interviews 


To understand the RWJF’s evaluation policies, the investigation draws upon a wide array of internal 
and public records, including reports, meeting minutes, anthologies or retrospective reports, framing 
documents, videos, and interviews. Other sources of evidence include publications in academic and 
practitioner journals, press coverage, and external studies about Foundation practices. Thirteen of 
the sixteen staff members in R&E participated in individual semi-structured interviews to illuminate 
informally held evaluation policies. Five Foundation leadership members, including the chief exec- 
utive officer, chief of staff, and the associate vice presidents, responsible for the two major strategic 
initiatives of the Foundation also participated in the interviews. Participants had been at the Founda- 
tion from just below | year to nearly 17 years. Interviews ranged from 30 to 60 min. A second round 
of interviews revealed recurring patterns of themes, thus meeting the criteria of adequacy (Fossey, 
Harvey, McDermott, & Davidson, 2002). 


Phase II: Policy Implementation in Three Case Studies 


To explore the field implementation of evaluation policies, three RWJF-funded evaluations were 
selected for close examination.* The Foundation staff assisted the researchers in the purposeful 
sampling (Patton, 2002) of these cases to represent a range of complexity and perceived success. 
Researchers and Foundation staff selected these cases based on consensus concerning strengths and 
challenges and the availability of information. The researchers explored each case through a com- 
bination of document analysis and interviews (Yin, 2009). The Foundation maintains an internal 
program information and management system, which tracks funded programs and the records 
produced under its auspices. Documents analyzed in the case studies—including requests for pro- 
posals (RFPs), responses to those requests, contracts or agreements, interim and final reports, and 
published products—were primarily drawn from this system. Key players in each evaluation parti- 
cipated in interviews, including Foundation program management staff, R&E staff responsible for 
overseeing the evaluations, and external evaluators contracted by the Foundation. Some Foundation 
leaders and program operators also participated. With few exceptions, case study interviews were 
60 min in length and were conducted via telephone. Table 1 presents an overall summary of the 
cases. Each case is also described in detail. 


Case |: Smooth case. Fun and Games is a school-based, inclusive play sports program designed to 
increase student engagement and improve school climate. The program received several million 
dollars from the Foundation to replicate in four cities over 3 years. At the time, Fun and Games had 
little experience with evaluation, and the program leadership was skeptical of its value. Early on, the 
Foundation did not attach an evaluation to the project, and the program relied heavily on satisfaction 
surveys and anecdotal evidence. After the initial 3-year period, the Foundation decided to expand the 
program to 27 cities through a considerable grant, and it was at this point that it became a significant 
enough investment to warrant an evaluation. 

The Foundation initially conducted a 1-year, mixed-methods implementation study to assess and 
compare procedures at 12 schools. Once interim findings were reported, the Foundation commis- 
sioned a randomized controlled trial (RCT) designed to measure program effectiveness. The R&E 
program officer worked closely with the implementation evaluator to develop the RFP, which 
included a requirement that selected evaluators partner with the research center involved in the 
implementation evaluation. The implementation evaluator helped to determine which firm would 
conduct the impact study. 

The RCT impact assessment included teacher and student surveys, physical activity trackers, and 
school records. The implementation evaluator submitted a list of anticipated deliverables for the 
project, including policy briefs and final reports to the Foundation, study participants, and publicly 
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Table |. 


Comparison of Case Study Elements. 


Case 


Smooth 


Challenging 


Complex 


Program description 


Program start date 


Evaluation time 
period 


Percentage of 
program funds 
spent on 
evaluation 


Evaluator selection 
method 


Implementation 
evaluation 


Impact evaluation 


Findings 


School-based program 
intended to affect 
student-level 
outcomes through 
inclusive play 

1996 


2004-present 


6 


Implementation: 
sole-sourced impact: 
invited bid 

Compare program 
implementation 
across school sites 

Randomized control 
trial 


Program resulted in 
positive changes for 
students and schools 


Child-care program designed 
to reduce risk factors 
promoting drug use 
through strengthening 
communities 


1992 
2001-2005 


32 


Invited bid, vetted through 
evaluation advisory 
committee 


Document model and 
implementation processes 
to inform scale-up 

Quasi-experimental design 


Inconclusive, could not 
demonstrate clear effects 


Building community-wide 
multistakeholder 
partnerships to improve 
patient health and health 
care 


2006 
2006-present 


12 


Sole sourced 


Implementation and impact 
evaluations are concurrent 
and ongoing and examine 
extent to which 
communities report quality 
and effects that has on 
consumer decision-making 


In process 


available data repositories. The project is ongoing and has continued to expand. Information from 
evaluations has been shared via the web, and at the time of data collection, several journal articles 
were being prepared for publication. 


Case 2: Challenging case. The Drug Free Families program sought to prevent substance abuse by 
strengthening at-risk communities and reducing risk factors through early childhood centers. The 
program began as a pilot in five geographically diverse sites. Families received case management, 
peer mentoring, and counseling. Community efforts included advocacy, forums, and community 
action projects; neighborhood revitalization; and agreements with substance abuse organizations to 
treat referrals from the program. A pilot evaluation documented the development of the model and 
the implementation process, with an eye toward broader implementation. Interview participants 
indicated that the program was struggling to define its activities and develop a concrete model, and 
in hindsight, the program was not ready for a large-scale roll out or an impact study. Nevertheless, 
despite some reservations, the Foundation proceeded with the demonstration phase and attendant 
evaluation. 

An evaluation advisory panel oversaw both the program and evaluation designs. The program 
director, who had a strong research background, also contributed to the evaluation design. The 
advisory panel members believed that having a more uniform intervention in place across sites 
would allow for a more rigorous investigation into the program’s impacts. The evaluator from the 
implementation evaluation offered recommendations to inform the demonstration program and 
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evaluation. Program implementation varied widely from site to site. Moreover, the national political 
context surrounding the child-care centers changed drastically (including a shift in focus and a 
significant decrease in funding), and site staff lacked experience with, and capacity for, developing 
large-scale programs. 

Fourteen of the fifteen demonstration sites participated in the impact evaluation. They were 
matched to comparison sites to facilitate a quasi-experimental design. Unfortunately, program 
activities were not well aligned to the stated goals and proved difficult to measure. The ambitious 
scope of the program and the multiple and complex contextual factors that influenced implemen- 
tation and the potential for impact further complicated matters. Program personnel were concerned 
that the evaluation model was too “academic,” and that it failed to capture the program’s “real- 
world impacts.” Evaluators (including the advisory panel) felt that the research design was rigorous 
and appropriate, but that the inconsistencies across implementation “made it difficult for the eva- 
luators to evaluate the conceptual model uniformly.” 

Even though implementation varied widely, most sites were able to execute various strategies 
with some degree of success, and the program contributed to establishing community partnerships 
between target organizations and participating child-care centers. On measures of impact, however, 
there was little or no evidence that treatment sites were more able than comparison sites to bring 
about changes to families. The program closed after the demonstration phase. 


Case 3: Complex case. The Holistic Health program is one of the Foundation’s largest programs and 
evaluations currently in operation. It “builds multistakeholder partnerships—among insurers, pro- 
viders, purchasers, and consumers—for the purpose of improving patient health and health care.” 
The program was initially in 4 markets and is now in 16. The Foundation scaled up after less than 
1 year and asked communities to submit proposals for the expansion. The evaluator explained, 
“The Foundation] basically said, ‘we are going to scale this up’ before there was ever an oppor- 
tunity to even learn from the pilots.” Implementation sites and program operators had varying prior 
experience with evaluation. 

Unlike in the other two cases, one evaluation team was commissioned for the duration of the 
program. Implementation and impact studies were integrated, and the evaluation continually 
evolved to match the changing and expanding program. The primary evaluator was a university 
faculty member who was directly identified by the Foundation before any of the individual sites 
were funded, and ideas for how the program would operate were still nascent. Because the program 
was so large, the evaluator assembled a team of experts; as many as 40 people have worked in some 
capacity on the evaluation. Program and evaluation activities are ongoing. The evaluation team has 
published multiple issue briefs and approximately 60 peer-reviewed articles. 


Analytic Procedures 


Data analysis occurred in iterative phases during and after data collection. Two existing frameworks 
informed the coding scheme: the AEA’s definition of evaluation policy and Trochim’s (2009) 
evaluation policy wheel (the primary components of both are described above). These frameworks 
identified and defined the key concepts of evaluation policy. The coding scheme was organized 
according to steps and components commonly found in typical evaluations, from contracting and 
planning to data collection and analysis, and reporting findings. Four raters assisted in piloting the 
framework through a preliminary examination of the Foundation’s policy records. The raters were 
assigned different documents to code. At in-person meetings over a 6-week period, the framework 
was refined and finalized by deleting, adding, and/or more specifically defining individual codes 
(Dillman, 2014). 
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Analysis of records and interviews began with attribute coding (Saldafia, 2009) to note basic 
descriptive information. The process began with a predetermined “start list” of codes (Miles & 
Huberman, 1994), which were “revised, modified, deleted, or expanded to include new codes” 
(Saldana, 2009, p. 144) as needed. Using Dedoose Version 6.1.18 software, information about 
record attributes and selected and tagged excerpts with provisional codes were indicated. Next, 
elaborative coding was employed to analyze the data contextually and further develop the theory. 
This coding approach is appropriate for qualitative studies that build on previous investigations to 
“support, strengthen, modify, or disconfirm the findings” (Saldafia, 2009, p. 229). 

The theoretical propositions that guided the research questions framed the case study analysis 
(Yin, 2009). The first step was descriptive to present a comprehensive account of the relevant 
features in these cases. Theoretical analysis followed with application of the general coding cate- 
gories to excerpts from case study documents and interviews. This allowed for a cross-case synthesis 
according to different evaluation policy domains and facilitated a robust discussion of the imple- 
mentation of various Foundation evaluation policies. 


Findings: The Evaluation Policies of the RWJF 


The first set of findings describes the policies that the Foundation has articulated. The RWJF uses 
instructional videos and documents to communicate its evaluation policies to its employees, eva- 
luators, grantees, and the general public. These sources include explicit policy information, but have 
various primary foci, including operations, reporting, and stakeholders. It is also common for the 
Foundation to articulate its evaluation policies more informally. Thus, the findings in this section are 
drawn from formal records (videos and documents) and also from interviews with Foundation staff. 


Evaluation Goals and Assumptions 


Formal records and discussions reveal that of primary importance to the Foundation is the use of its 
evaluation findings whether in practice or policy. Foundation representatives described the impor- 
tance of assessing the strengths and weaknesses of the Foundation’s strategies, understanding the 
impact of its work, spreading effective approaches, ensuring its own credibility, and promoting 
social change. The R&E staff assume evaluation is a challenging activity that encounters many 
obstacles as the work unfolds, including the political nature of all evaluation activities. The Founda- 
tion also assumes that stakeholder involvement (particularly in the development of evaluation 
questions) leads to more meaningful results that will be implemented with less resistance. 

Foundation discussion of evaluation policy went far beyond these goals and assumptions, how- 
ever, to address more practical aspects of the work, including decisions about what/when to evaluate, 
selection of evaluators, design and execution of the evaluation, administration, stakeholders, and 
reporting of findings. These areas are discussed in turn. 


Deciding What and When to Evaluate 


The Foundation is strategic about what types of evaluations should be scheduled and when. Imple- 
mentation evaluations are desirable in early stages of program development. Once a program is 
mature, the Foundation undertakes outcome evaluations. When multiple mature programs with 
similar objectives have been operating for a while, the Foundation may conduct a cross-initiative 
evaluation to enhance understanding of particular ideas or approaches to certain problems. 

Earlier in the Foundation’s history, staff considered evaluation as separate from program activ- 
ities, and evaluations were funded to see if a program “worked” as activities were drawing to a 
close. A program officer explained that this model “had a certain elegance to it because you get a lot 
of benefit from hindsight.” By the time results from those evaluations came in, however, it was often 
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too late for them to be actionable, and the Foundation’s priorities and objectives might have changed 
in the intervening time. More recently, evaluation activities begin in concert with the start of 
program operations, and evaluators act more as collaborators with the program operators. 
As a program officer shared, “Once [evaluators] discover something, shouldn’t they share that with 
the program that we’re trying to make better?”’ Even though Foundation staff agreed this is a positive 
change, there are some challenges related to the fact that programs frequently evolve in the early part 
of their existence. 

Evaluations are sometimes tied to program budgets (e.g., any program with an operating budget 
over US$400,000 would receive an evaluation) or to the connection between the program and the 
strategic objectives of the Foundation. The Foundation makes every effort to inform programs in 
careful detail about the expectation that they must fully participate in research and evaluation 
activities in order to secure Foundation funding. 


Selecting and Hiring Evaluators 


The Foundation’s written policies address logistical aspects of selecting evaluators, including how 
RFPs should be developed and the criteria upon which they will be evaluated. The policies also 
describe the importance of realistic timetables and other project management issues. The Foundation 
currently dedicates 20% of grant dollars to research and evaluation. While not an explicit policy, this 
figure is quoted in the framing document that describes the Foundation’s research and evaluation 
efforts. According to interviews with Foundation staff, budgets for evaluation are determined on a 
team-by-team basis whether through a top-down or bottom-up approach. 

The Foundation assigns one of its R&E program officers to each project being evaluated, and this 
individual selects an independent evaluator to enhance external credibility and avoid conflicts of 
interest. R&E program officers identify evaluators either through an invited bid process or by direct 
contact. There is no clear written guidance about when to use each method; however, there seems to 
be an implicit preference for the invited bid process. Beyond having a track record for balancing on- 
time completion of deliverables, there was little discussion of evaluator characteristics in the 
records. Foundation leadership expressed a concern that the currently available pool lacks diversity 
in terms of ethnicity, socioeconomic status, and so on. One individual noted it is important for 
evaluators to match the population they are evaluating; relatedly, with a limited pool, ideas and 
evaluation approaches are also less diverse. At the time of this study, the Foundation was undergoing 
the process of identifying and vetting firms for inclusion in an evaluator database, envisioned as a 
resource for Foundation R&E officers to consult when inviting evaluators to bid on projects to 
partially address this concern. 

RWJF evaluations are collaborative between evaluators, Foundation program officers, and Foun- 
dation R&E officers. Several program officers noted this provides richer data and better communi- 
cation, which results in greater program staff buy in and support of evaluation findings. Previously, 
there had been a “firewall” between the evaluators and the program. This led to several challenges, 
however, and so the current collaborative model was introduced. 


Designing and Conducting the Evaluation 


Codes pertaining to design and conduct of evaluations were the most represented in the overall 
framework. The specific categories nested within this idea provide a chronological walk through the 
process of conducting an evaluation—from understanding the program to design, data collection, 
and analysis. 

The Foundation places great value in logic models—visual depictions of program inputs, activ- 
ities, outcomes (short-, medium-, and long term), and the logical connections between those ele- 
ments. They believe stakeholders should be involved in logic model development to increase clarity 
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about the program and reach a common understanding to inform the evaluation questions: ““Good 
questions, when they are thoughtful and well informed given the range of perspectives that went into 
developing them, are more likely to yield findings that are useful, relevant, and credible.” R&E 
program officers generally initiate the evaluation design process, often developing research ques- 
tions before identifying an evaluator to do the work. 

The Foundation views the randomized control trial (or RCT) as the most credible type of outcome 
study, although it recognizes that contextual factors may prevent appropriate use of this design. 
However, in all cases, the Foundation maintains that the approach should be rigorous and geared 
toward reducing bias. Some interview participants indicated specific methodological preferences 
based on their training and noted that intended audience for the evaluation findings might also 
influence design choices and methods. Not surprisingly, the Foundation prefers claims that are well 
supported by evidence, conclusions that fit the analysis, and descriptions that include enough 
information such that readers can draw inferences. In short, the Foundation promotes evaluations 
with findings that are constructive, impartial, useful, relevant, and credible. 


Administration 


The Foundation expects that proposals will include “detailed information regarding how the project 
will be organized [and] which employee is responsible for assuring adherence to project schedules, 
monitoring expenditures, and addressing delays.”’ Evaluators work with Foundation staff to deter- 
mine research questions and design but are ultimately responsible for setting the mission and vision 
of the project. Evaluators are expected to be able to “see both the forest and the trees.” 

The Foundation encourages program personnel to play an active part in the rollout of evaluation 
activities. Programs are expected to provide evaluators with data to support the evaluation and, in 
some cases, to assist with primary data collection. Program officers do not interfere with evaluation 
activities once the work is underway, and particularly as conclusions are being reached and findings 
are being shared. The Foundation assumes that greater frequency and higher quality communication 
between program personnel and evaluators create a smoother and more effective process. RWJF 
wants to promote reciprocal relationships between the evaluator and the program characterized by 
trust and mutual respect through an equitable balance of power, timely feedback about the program, 
formalized work plans and communication protocols, and clear procedures for resolving 
misunderstandings. 


Stakeholders 


The Foundation emphasizes giving stakeholders decision-making power in evaluations to promote 
their eventual use of evaluation findings. Stakeholders might be program managers or developers, or 
those who are in some other way responsible for the initiative’s success. The Foundation believes 
that the process of selecting stakeholders to participate in the evaluation should be attentive to the 
attitudes, beliefs, and knowledge of individuals. Stakeholders ideally provide buy in and support for 
evaluation activities and have great interest in the issues being examined. The Foundation values 
diverse perspectives and cultural, religious, ethnic, and geographical backgrounds. It recognizes that 
those with more direct experience are better informants about program operations, and their invol- 
vement can strengthen and solidify a common program understanding among all relevant groups. 


Reporting Findings 

The RWJF engages in several types of reporting at the program level and annually across programs. Its 
policies indicate to whom and how particular aspects of the evaluation should be shared. Of primary 
importance is “providing high-quality, objective information.” Results should document successes 
and challenges, measure the magnitude of these effects, and assess what the effects mean for the 
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organization and other parties. Audiences are internal (program officers, management, board of 
trustees, and key program stakeholders) and external (public and private decision-makers, academics, 
researchers working in similar areas, the general public, and stakeholders in other programs). 

The Foundation stipulates that reports should be formatted in an understandable way, including an 
executive summary, a brief and clear body, and an appendix that includes the more technical details. 
The Foundation also seeks to share findings in peer-reviewed journals and through the Foundation 
website. For larger Foundation objectives, reporting should speak to the health of program develop- 
ment, services to grantees, and the Foundation’s impact on its areas of interest. For program-level 
reporting, the Foundation provides a specific and detailed outline of an ideal report. At the conclusion 
of each program and evaluation, a “Program Results Report,” compiled by Foundation officers not 
directly involved in the particular program or its evaluation, describes what happened in the program 
and what was found out about it. These reports are shared publicly on the Foundation website. 


Overall Evaluation Policies 


These findings illuminate specific policy preferences held by the Foundation, but the degree to 
which they are articulated as such varies across categories. Often, when more consistently articu- 
lated policies emerge, there is an explanation underlying why particular choices are preferred. For 
example, a high degree of stakeholder involvement is preferred because of the assumed connection 
between involving stakeholders and the ultimate use of evaluation findings. The Foundation prefers 
more rigorous evaluation designs because of a preference for publishing findings in academic 
journals. Articulation seems to be strengthened when strong reasoning is provided along with the 
policies, suggesting that individual explicit articulation of policies might not be as important as 
consideration of policies as an integrated system grounded in the intended outcomes. 


Findings: The Implementation of Evaluation Policy 


The case studies have a high degree of consistency in implementation of evaluation policy goal 
statements. There is, however, less alignment concerning operational policies, which are imple- 
mented on a case-by-case basis and are largely informal and unwritten. Evaluation policies are 
implemented differently across the three cases. As a program officer summarized, “‘I think the policy 
is to try to use the best design for the situation.” 

That programs interpret policies differently points to an alternate understanding of the utility of 
evaluation policies. Rather than a specific articulation of rules and guidelines, the Foundation might 
consider evaluation policies as a system based on guiding principles. Such a system might enhance 
the utility of evaluation policies without making them too restrictive. 


Implementation of Evaluation Policy Goals 


The evaluation policy goals reflected in the case studies reveal the Foundation’s emphasis on use of 
findings. For example, Holistic Health represented a substantial investment of Foundation assets, 
resulting in significant efforts to use evaluation findings to inform the field. Even though the pilot 
and implementation study phase of program was truncated, there were opportunities to use evalua- 
tion findings to support program improvements. The evaluator explained, ‘“‘[The program has] made 
programmatic changes ... I think our work has contributed to their thinking and the program 
changed as a result.” 

Likewise, findings from the initial implementation study for Fun and Games, the smooth case, 
were used in a variety of ways to inform both specific improvements in program delivery and the 
strategy used to scale up the program. Part of the reason that this model was so successful was that 
program leadership was receptive to the information. The evaluator shared: “[The program] didn’t 
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want to hear that everything is great and they’re doing an awesome job, but they really wanted to 
hear what was working.” The program was also able to use evaluation findings to aid its marketing, 
funding, and expansion efforts. The evaluator explained—and program personnel confirmed— 
“There is a lot of collateral impact from that evaluation.” 

Drug Free Families proved to be a more challenging example. The context surrounding the 
program suggested there was an opportunity to directly influence federal policy and appropriations 
through demonstration of an effective program, and the desire to capitalize on this moment strongly 
influenced the design of the evaluation. Ultimately, the program director felt the evaluator’s interests 
in using the findings to influence high-level policy decisions were at odds with the Foundation’s 
more practical interests. The ability to capture more nuanced findings that suggested positive 
program impact was sacrificed. The findings could not support the larger political aim, and the 
program director’s interests for what the evaluation could accomplish limited opportunities for the 
program and Foundation to learn from the evaluation. 

If we were, instead, to consider evaluation policies as a system, the evaluation policy goals would 
anchor the development of the remaining guiding principles for evaluation practice. The RWJF is a 
model in this area—the Foundations’ commitment to what they hope to achieve through evaluation is 
consistently understood and implemented by staff. The Foundation could use these widely understood 
goals as the basis for developing an evaluation policy system, consisting of interrelated components that 
detail various aspects of evaluation practice. The evaluation policy system would then describe how the 
RWJF’s preferred evaluation practices are intended to work together to achieve the stated goals. 


Implementation of Policies on Deciding What and When to Evaluate in Practice 


As noted earlier, it is now Foundation policy to begin evaluations with the start of program activities. 
This worked well for Fun and Games. An initial implementation study was commissioned to inform 
the program as it moved into the demonstration phase; as this occurred, an impact study was 
commissioned, informed by the earlier work. The Foundation required that the implementation 
evaluators be included as partners in the impact study to ensure a smooth transition across the two 
efforts. In the more complex case of Holistic Health, the program scaled up much faster than 
originally planned, which meant the evaluation had to be modified significantly to account for 
programmatic changes. Although ultimately everyone was very satisfied with the process, it was 
not as smooth as in the Fun and Games case. 

The evaluation of the Drug Free Families program occurred prior to the modification in the 
policy, and thus, the evaluation was commissioned after program activities were underway. The 
Foundation first funded an implementation study when the program was in its pilot phase at five 
sites. An impact study followed, once the program expanded to 15 sites during the demonstration 
phase. The implementation evaluation results had very little bearing on the program scaling efforts 
or the impact evaluation that followed. The order of implementation study followed by impact study 
did technically follow the Foundation’s policy, but it did not play out successfully in the field. 

This examination of the implementation of policies related to deciding what and when to evaluate 
affords insight into what happens as evaluation policies change. The case studies show the evolution 
of the timing of evaluations related to the launch of a program—a decision that was again rooted 
firmly in an underlying philosophy. This philosophy—that evaluations are better suited to facilitate 
program improvements when they start as programs are initially implemented—serves as a guiding 
principle for the decisions about evaluation that follow. 


Implementation of Policies on Funding and Selecting Evaluators 


Foundation research and evaluation framing documents indicate that 20% of overall spending be 
devoted to research and evaluation efforts (although the division of monies between these two 
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activities is not specified). In the smooth case, Fun and Games, evaluation costs represented 6% of 
the overall budget. In the challenging case, Drug Free Families, 32% of total costs went to evalua- 
tion, and in the large and complex case, Holistic Health, evaluation costs made up 12% of the overall 
budget. These percentages do not tell the complete story, however, as they represent portions of 
program budgets that vary considerably. Considering the amount spend in actual dollars across the 
three evaluations reveals that nearly 44 times more resources were expended on the complex case 
than the challenging case and nearly 13 times more than on the smooth case. 

Fun and Games, the smooth case, had two separate evaluators. The evaluator for the implemen- 
tation study was sole sourced by the R&E program officer based on expertise and experience. The 
impact study was contracted through an invited bid, and RFP development was led by the R&E 
program officer in collaboration with program personnel and the evaluator from the implementation 
study. The selected evaluators were required to work with the research center that conducted the 
implementation study. A team member explained, “Because we were so familiar with the program 
already, [the Foundation] wanted to keep us in there.” According to the R&E officer, this involve- 
ment meant the evaluator “would be comfortable immediately so you wouldn’t have all kinds of 
problems.” Once the evaluator was selected, the Foundation and the evaluator agreed on a “précis” 
that described the scope of work, evaluation activities, project deliverables, and budget. 

The more complex Holistic Health evaluation was sole sourced. When the Foundation was doing 
some advance work to introduce the program, the evaluator was informally consulted on program 
design and, at one point in this process, was asked to write an evaluation proposal. Over the life of 
the program and the evaluation, there were three separate contracts. Each was followed with a précis 
following the same basic format described above. 

For Drug Free Families, the evaluation advisory panel oversaw the selection of the evaluator. 
They interviewed two teams, and the process was described as both formal and participatory, 
involving the advisory committee and the program director. According to one Foundation leader, 
“Everything about this program was elaborate ... It was a big investment and it was something that 
everybody cared about.” Of the three cases, this was the most rigorous selection process with the 
greatest number of proposals (10-12) solicited. This case most closely matched the Foundation’s 
stated preferred method for contracting an evaluator. The evaluation also ended up being the most 
problematic of the three investigated in this study, however, in part because—as the program 
director explained—the evaluator was not the right fit. Furthermore, among the three programs, the 
greatest proportion of resources was dedicated to the evaluation, while the evaluation with the 
smallest relative budget was the smooth case. Although it is not possible to generalize based on 
only three instances, this does suggest that, in some cases, evaluation funding level might not be an 
important determinant of evaluation quality. 

Selecting evaluators, however, is an area that might benefit from a higher level and intentional 
systems approach to developing evaluation policies. Enforcing specific rules that dictate the exact 
process for selecting evaluators would prove too cumbersome. Rather, The Foundation could focus 
its energy on making decisions about the conditions in which they desire particular approaches to 
selecting evaluators as well as the ultimate goals for the particular evaluation. This could bring more 
consistency to the selection of evaluators and potentially allow for more diversity in the pool of 
people and organizations that conduct evaluations. 


Implementation of Policies on the Designing and Conducting Evaluations 


The design of the evaluation for Drug Free Families was challenging from the outset, in part because 
the ultimate desired outcomes (less substance abuse) could not be immediately ascertained. The 
program’s interventions were directed at very young children as a means of preventing substance 
abuse later in their adolescent and adult years. As one member of the Foundation leadership put it, 
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“That’s sort of an evaluator’s nightmare: The ultimate effects are in flux.”’ Drawing on input from 
the program office and the advisory panel, the evaluation instead sought to measure intervening 
indicators at the family and community levels. 

The most rigorous aspect of the Drug Free Families evaluation was a quasi-experimental impact 
study that used matched communities to measure process and impact. According to the evaluator, 
“We couldn’t do random assignment, but we wanted to have the strongest design in order to be able 
to have fairly strong conclusions at the end.” Foundation program and R&E officers, the chair of the 
evaluation advisory panel, other panel members, the evaluator from the implementation study, and 
the impact evaluators were all involved in the design development, which was described as a 
“negotiation.” The evaluator explained, “there was a lot of evolution of it before the design was 
finalized.’’ Ultimately, even though the results were disappointing, everyone except the program 
director considered the evaluation design to be strong. (The program director felt that the design was 
not sensitive enough to capture change.) 

Fun and Games was initially evaluated through an implementation study described as “pretty 
open ended” and “qualitative.” The 1-year study was designed collaboratively among evaluators 
and program staff. The evaluator explained, “If your work is going to be useful, you have to find out 
from the partners what will be helpful to them.” The results informed the design of the impact study, 
which maintained a focus on program processes and added aspects of outcomes. The impact study 
design was finalized after the evaluator was selected. The Foundation, implementation evaluator, 
and program personnel were all involved in this process. A Foundation officer explained, “I think 
that’s a great lesson learned for how to think about structuring evaluations .... It’s not an evaluation 
designing the program; it’s an evaluation working alongside of the program strategy being devel- 
oped.” According to the program director, “the goals were to document the outcomes using a 
randomized design so that we could communicate to the world about our impact.” Structures were 
in place to support the RCT design and to allow schools to be matched to one another prior to 
randomization. The design was further influenced by the desire to publish results in a national 
database. To qualify for the database, research needed to take place in 32 schools (vs. the original 
20). After a presentation to Foundation staff, the R&E program officer was authorized to increase the 
evaluation budget to include the larger number of schools. 

The evaluation of the Holistic Health program utilized a quasi-experimental design (often 
employed when randomization is not possible or appropriate). A specific challenge in this case was 
that the initiative was still evolving. The evaluator explained: 


It’s trying to make academic and scientific sense of the messy real-world realistic evaluation .... This 
isn’t an experiment. Nothing is randomized here. So we’re trying to ... [do] it in a way that is 
scientifically credible and will meet the merits of peer review. 


The evaluation was designed collaboratively by Foundation staff, the program office, and eva- 
luators but was also informed by the evaluator’s own goals: “The goal was to design a study that 
would have scientific merit and allow us to publish.” The early introduction of the evaluator 
facilitated participation in the early discussions about what the program was supposed to accom- 
plish, allowing for easier adaptation to changing circumstances, such as when the Foundation 
decided to scale up the project in under a year. 

Designing and executing evaluations is an area in which overly specific policies might prove to 
be more of a hindrance than a help to the Foundation’s evaluation efforts. Instead, a set of guiding 
principles that outline decisions that need to be made and the criteria that should be considered when 
making those decisions would better serve The Foundation’s efforts. These criteria could be rooted 
in some of the other policy areas including goals and stakeholder involvement to further strengthen 
the evaluation policies as a system. 
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Implementation of Policies on Collaborating and Communicating 


In the Holistic Health program, a Foundation leadership member described the collaboration as “a 
constant engagement of people at the Foundation, between the people who are managing the 
program and the people who are evaluating it.”” Fun and Games also benefitted greatly from 
collaboration largely because of the relationships between the implementation and impact evalua- 
tions and the ongoing participation of the program director in both evaluations. Relationships were 
characterized by respect and a focus on common goals. 

Collaboration efforts were generally supported by ongoing communication typically brokered by 
the Foundation. All of the evaluations had some sort of preestablished plan about how and when 
various parties would communicate about evaluation process and progress. In all cases, there was 
generally more contact in the beginning when evaluations were being designed and at the end when 
devising a reporting strategy. In the Drug Free Families program, the points of contact increased in 
response to some of the difficulties experienced, but the collaboration fell apart as results from the 
evaluation were disappointing to both the Foundation and the program. Some evaluators across cases 
indicated that this level of communication was unusual compared to their other evaluation experi- 
ences. One noted, 


I was used to the sort of prior grants where you get the grant, and then you’d go do your thing. I didn’t 
want anybody being able to influence that at the Foundation. But after going through it, it was critical 
because we would need information on program changes. 


An example of how policies operate implicitly regardless of their explicit articulation is evident 
in that the Foundation clearly promotes regular and ongoing communication between evaluators, 
program operators, and R&E officers. Even though this area of evaluation is functioning consistently 
at the Foundation, articulating guiding principles about the expected frequency and content of 
communications might still prove beneficial. As an example, evaluators bidding on the contract 
would have more information that could help them develop their scope and budget for the work with 
greater accuracy. 


Implementation of Evaluation Policies on Involving Stakeholders 


The emphasis on specific strategies for stakeholder involvement in Foundation records was not 
mirrored in the case studies. Rather, involvement was determined based on specific program needs 
and the context at hand. In the Holistic Health program, for instance, R&E program officers 
identified the Foundation, the program communities, communities doing similar work, researchers 
in health systems, federal policy makers, and those trying to improve the quality of health systems as 
stakeholders. The evaluator largely echoed this list. The relationships between the evaluator and the 
intervention communities evolved naturally, as there was frequent and close contact over an 
extended period of time. The evaluator explained how the communities were engaged in the study, 
specifically through “sharing our research findings, and meeting requests that [the communities] 
have for information.” The evaluation team worked with stakeholders to share findings in speeches 
or at board meetings and consulted them for “feedback on different aspects of the evaluation and its 
design.” The R&E program officer said the evaluator had good relationships with Foundation staff 
and the national program office, and that developing these relationships was critical because it 
provided a means for collecting data. The program officer “worked really closely with the evaluator 
and his folks and tried very hard to facilitate direct contact between the program folks and the 
evaluator.” 

In the Fun and Games case, the two evaluators identified similar lists of stakeholders, including 
the program, the Foundation, schools, and policy makers. In the implementation study, the program 
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was considered the primary stakeholder, whereas in the impact study, it was the Foundation. The 
R&E officer assisted building relationships between groups early in the process. The nature of 
relationships and extent of involvement with other stakeholder groups was less clear. In some cases, 
the evaluator relied on school administrators for access to data, and evaluators provided district-level 
reports when interest warranted it. Relationships seemed to unfold as needed rather than as a result of 
a comprehensive stakeholder engagement strategy. 

Many of the same broad categories of stakeholders were named in the Drug Free Families 
program, but there was a more explicit and direct focus on the federal government because the 
program was trying to influence and shape public policy. The relationship between the program 
director and the evaluators was contentious. Despite being directly involved in the selection of the 
evaluation team, the program director felt that the evaluation was not meeting the program’s 
needs. The program director had very specific goals in mind related to advocating for the program 
to the federal government. Despite oversight and involvement from an advisory panel that 
mediated some of the conflict, and the evaluators’ concerted efforts, the relationship issues were 
never resolved. 

In considering involving stakeholders, we see an example of where evaluation policies are very 
explicit, yet do not lead to consistent implementation. However, inconsistent implementation does 
not suggest a failure of the policy. Involving stakeholders in evaluation activities is a strength of the 
Foundation, as seen in all of these cases. The inconsistent implementation instead suggests that 
policies as they are articulated might not be flexible enough to tolerate necessary modifications in 
stakeholder involvement across evaluation efforts. This again points to a need for guiding principles 
and a system rather than overly codified evaluation policies. 


Evaluation Policy Implementation for Reporting Findings 


The information gathered through the case studies suggests that many reporting decisions lie with 
the evaluators. In proposals, evaluators are asked to outline their deliverables. The consulted exam- 
ples described expected deliverables described in broad and vague terms, although they did demon- 
strate that the Foundation prioritizes the publication of results in academic or peer-reviewed 
journals. Other types of reporting included interim and annual reports, policy briefings, presenta- 
tions, and reports tailored to communities that received the intervention. 

In Holistic Health, the primary method of reporting has been peer-reviewed journal articles. The 
evaluator estimated that they have produced approximately 60 publications. Not all are related to 
evaluation findings per se, but the articles build an evidence base around a strategy, which is a 
Foundation priority. The evaluation team has also submitted interim and annual reports to the 
Foundation and prepared issue briefs available on the web. They have made presentations to 
treatment communities and produced customized community-specific summaries. The R&E officer 
noted the evaluation team decides where to publish their findings. 

In the smooth case, Fun and Games, the evaluator described the deliverables as “‘an interim, final 
report, and three issue briefs, multiple presentations, and a research article for a journal.” The 
evaluator reports findings simultaneously to the program and the Foundation. Additionally, several 
school districts have requested specific reports from the impact study. Much of the reporting has 
been geared toward policy makers, however. A Foundation leadership member noted, “we'll have 
briefings with policy makers on capitol hill so that they know of the program.” In keeping with the 
overall feel of the project, the process of reporting findings has been collaborative. The evaluators 
made initial decisions about the content and structure of policy briefs, and the Foundation helped 
make them more publicly accessible in terms of length, appearance, and language. The program 
director explained that this was not without challenges: “Evaluators want to speak very carefully and 
conservatively about what the findings mean, and we [the program] want to speak very liberally and 
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broadly.” The Foundation R&E officer has played an important role in helping the program and the 
evaluators determine language that met the needs of both parties. 

The overall difficulties experienced in Drug Free Families extended to the reporting phase as 
well. Each year, the evaluators submitted an annual report. The chief product available that sum- 
marizes this program is the Program Results Report, which is on the Foundation website. There were 
far fewer products submitted for peer review than in the other two evaluations. When the final 
evaluation report was submitted, the evaluators offered to meet with the program office, but that 
meeting was never held due to exigent circumstances. 

Reporting is another area, where RWJF has been more intentionally explicit. However, while the 
expected reporting format for the final report due to the foundation is clearly outlined, instructions 
are more implicit when it comes to dissemination beyond the Foundation. More guidelines about 
Foundation expectations might assist evaluators with their planning. Furthermore, upfront guide- 
lines and conversations about dissemination activities would surface considerations of audience for 
eventual findings, which might influence evaluation design issues. 

This examination of evaluation policies at RWJF has pointed out some significant strengths 
that include: a clear emphasis on the importance of evaluation; building evaluation into the 
major social improvement initiatives; very strong collegiality in defining, formulating, and 
monitoring evaluations; using a variety of evaluation products (reports, briefings, and point 
papers) to get evaluation results to users; formal and enlightened report writing guidelines; and 
use of evaluation advisory committees to strengthen evaluation approaches and methods. On 
the other hand, these policies have evolved piecemeal over time rather than as a result of a 
deliberate evaluation policy strategy. As a consequence, some written policies are absent or 
inadequate, and some policies are followed with less consistency than others. This might be an 
opportune time for The Foundation to reflect and engage in discussions about the trade-offs of 
formality, uniformity, and flexibility. An approach to building evaluation policies as a system 
with guiding principles rooted in why certain activities are more desirable than others would 
facilitate this conversation. 


Evaluation Policies as a System 


This study has implications for evaluation policy across several levels: (1) the ways individual 
policies affect evaluation practice, (2) the implications policies have at an organizational level, and 
(3) evaluation policies as a system. 

Examining individual policies offers an opportunity to explore how those policies affect evalua- 
tion practice. For example, the Foundation’s policy of selecting evaluators through sole sourcing or 
RFPs sent to preselected evaluators or firms means they deal with a limited pool of evaluators. 
Again, this practice makes sense—as one program officer put it, “this isn’t amateur hour,” and this 
approach minimizes risk that the evaluators will be unqualified. An un- or underqualified evaluator, 
in most severe cases, may result in evaluation malpractice and wasted investment in very high stakes 
settings. Furthermore, processes involved in running completely open bids are time consuming and 
resource (both time and staff) prohibitive. 

However, these concerns should be weighed against the consequences of drawing from a limited 
evaluator pool—that it may result in a lack of diversity in ideas. Moreover, research design ideas and 
evaluation questions almost always originate with the Foundation R&E officer. In spite of back-and- 
forth between the evaluator and the program to finalize the questions, there was no evidence in the 
case study data that questions ever changed dramatically. Thus, a similar group of people asks a 
similar set of questions, just across different contexts. Not only does this stand in contrast to the 
Foundation’s articulated appreciation for diversity, but a potential to explore innovative ideas and 
approaches is missed. The challenge here is developing policies that support a balance between 
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ensuring high quality and resource efficiency in high-stakes evaluations and allowing enough 
flexibility to support diversity and innovation. 

In considering the evaluation policies at an organizational level, we discover implications 
that speak to how organizational functioning is shaped through evaluation policies. At the 
Foundation, tremendous consistency runs across the Foundation’s documents and personnel 
when it comes to describing evaluation policy goals; however, evaluation operational decisions 
are decentralized. Each R&E program officer was able to make evaluation activity decisions 
based on his or her interests and objectives and the particular context at hand—they were not 
directly guided by a policy, explaining in part the variation across the three case studies. This 
approach makes sense for the Foundation, which hires experts with extensive training and 
experience. Rather than prescribing evaluation activities in a top-down, cookie-cutter fashion, 
the Foundation trusts that these experts will draw on their expertise to inform the best design 
and activities for a given situation. Overcodification might limit evaluators’ ability to be 
responsive to particular conditions. The result, however, is that when learning happens about 
evaluation practice, it happens in silos. While improving practice based on reflection is impor- 
tant, it can be challenging from an organizational perspective. The Foundation does not seem to 
have a mechanism to support learning about evaluation policies and practice in a top-down 
fashion or a way to benefit from shared learning. 

The Foundation does have structures in place to support learning in other arenas. For example, 
R&E staff review findings from evaluations and survey key informants to prepare annual reports 
designed to help the Foundation learn about its progress toward strategic goals. These reports are 
made public and discussed internally at the Foundation. By contrast, much of the experiential 
knowledge gained from participating in research and evaluation lies with a single individual and 
is typically only shared informally. 

Another way to examine evaluation policy implications is to consider the policies as a system—a 
set of connected points that make up a more complex whole and work together in the aim of specific 
objectives (Meadows, 2008). In this way, the policies themselves might not be the most important 
unit of analysis. Rather, the connections within policies and between policies and practice are 
important places to focus attention. Thinking of policies as a system has implications for both their 
development and the ways in which they bear on evaluation practice. To construct policies from a 
system perspective, rather than generating a series of isolated rules and guidelines, individual 
policies should be grounded in their desired outcomes. 

This study offers several insights into this potential systems approach for developing evaluation 
policies. Overall, where the Foundation was very consistent in its language around evaluation policy 
goals, the operational policies were inconsistently implemented—goals and operations were two 
disconnected facets of an evaluation policy rather than a unified system. For example, the Founda- 
tion had significant and carefully detailed policies about stakeholder involvement, but those were 
inconsistently followed in practice. Instead, program officers and evaluators valued the use of 
stakeholders, but facilitated their involvement in different ways depending on the situation. A 
systems approach to evaluation policy would be tolerant of these variations, but be grounded in 
why stakeholders should be involved. For example, a why statement about evaluation policies 
governing stakeholder involvement might read, ‘“‘evaluation findings are more likely to be used 
when stakeholders are involved in the evaluation process.” In an evaluation policy system, this 
statement describes the desired outcome (use of evaluation findings) that would guide and connect 
the operational policies around involving stakeholders. Even though involving stakeholders in 
evaluation practice is widely regarded as a good idea on its own merits, connecting an explicit 
stakeholder involvement policy to a why statement with an explicit goal points to how their involve- 
ment might be realized. Specifically, evaluators could base their decisions on stakeholder involve- 
ment on considering how to involve them in such a way to promote use of findings. In a systems 
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approach to building and enacting evaluation policy, a key aspect of implementing and enforcing it 
would be a constant interrogation of the connections between policy goals and operational decisions. 

The smooth case study offers another example of how this policy system might work. In this case, 
the way that the implementation and impact evaluations worked together profoundly and positively 
impacted the evaluation process. Rather than writing isolated rules that suggest closely coordinating 
implementation and impact evaluations as a best practice, there could be more explicit policies about 
how implementation and impact evaluations should work together, rooted in the why. That impact 
evaluations are strengthened when they are attentive to implementation issues. Thus, the implemen- 
tation of a related policy would beg the question of how and in what ways could this particular 
evaluation be strengthened through closer connections across implementation and impact studies. 
The questioning of the connections between the goals and the actions taking place to fulfill them 
would serve as the mechanism that promotes the health of the evaluation policy system. 

Finally, an evaluation policy system may have prevented the challenging case from ever happen- 
ing. In this instance, there could be a basic policy goal detailing the why that describes a reason for 
ensuring that programs are evaluable—that evaluations can only produce accurate and actionable 
results when the evaluand is ready for rigorous study. In the challenging case, if the program officer 
and/or evaluator were to constantly and carefully consider how they might ensure the evaluand’s 
readiness, they might have realized the practical impossibilities much sooner and been able to stop 
the study or take a different evaluation approach entirely (examining processes and implementation 
more in depth rather than impacts). However, this may be an oversimplification—in some cases, the 
political context may be too powerful a force for thoughtful evaluation policies to counteract. The 
challenging case poses a reminder that even the most careful and well-thought evaluation policies 
exist within a larger context that bears on policy implementation. 


Implications and Directions for Future Research 


This study scratches the surface of our understanding of how evaluation policy affects practice. 
Although individuals who are hired to conduct evaluations will have, at best, limited ability to shape 
the policies that guide their work, an awareness of how they function and influence practice is 
helpful. For example, understanding an organization’s evaluation policy goals may help evaluators 
better respond to RFPs. Beyond this immediate implication, however, are broader ramifications for 
the practice and study of evaluation. 

Constructing evaluation policies as a system within the organization could support organizational 
learning around evaluation practice. For example, an internal meeting at the conclusion of a project 
could provide a forum for sharing lessons that can then be synthesized and shared organization wide. 
Explicit policies about this type of information sharing would help ensure that it takes place. 

Future research could explore how specific areas of evaluation practice interact with explicit 
evaluation policies. It would also be worthwhile to explore how evaluation policy can be codified 
while allowing flexibility that supports necessary adaptations in the field. Likewise, a simulation 
study in which evaluators are assigned to separate conditions—one where the evaluation is guided 
by very explicit policies over certain areas and the other in which fewer rules are imposed—and then 
asked to design evaluations could reveal how policies affect evaluation design. Furthermore, this 
type of study could reveal the effects of policies on program, organizational, and/or stakeholder 
engagement in evaluation in addition to the perceived quality and usefulness of evaluation practice 
and how evaluation results might eventually be used. The work presented here also points to more 
applied areas of future research: A similar study in a different nonprofit organization might help to 
replicate these results, and attention could also be turned to federal or state bodies that commission 
evaluations to build understanding of what evaluation policies mean for practical work. 
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Limitations 


All types of inquiry have limitations. The study approach taken has the potential to be subject to 
researcher biases and particular ways of viewing the world. To address this, systematic records of all 
data collection efforts and sources were kept and different types of data sources were used within 
each phase of the study to triangulate findings. The analysis was also informed by previous scholar- 
ship, and the conclusions are linked to extant theories. Second, the three selected cases present only a 
snapshot of how policies were enacted within a particular context, and it is challenging to generalize 
these findings to the practices of the entire Foundation. Likewise, the RWJF is a unique organiza- 
tion, so these findings cannot be directly applied to other settings. Nevertheless, the conclusions 
offer insight into how, broadly speaking, evaluation policies might help influence evaluation prac- 
tice. Third, Foundation staff provided access to internal documents and interview participants, 
leaving the study open to questions about whether these data might be biased. This potential 
limitation was initially addressed through the researcher establishing criteria for case selection that 
included examples of both positive and negative evaluation experiences. The Foundation supported 
these criteria and facilitated selection of the challenging case, which was not a reflection of the 
Foundation’s best efforts, as acknowledged by individuals within and outside of the Foundation. 
Additionally, one of the case studies concluded several years ago and some participants did not 
remember specifics. Thus, interview responses were corroborated by document analysis. Finally, the 
researcher presented both intermediate and final study findings to representatives at The Foundation 
to member-check results. The Foundation representatives offered additional insight into interpreta- 
tions that are reflected in the study’s conclusions. 


Conclusion 


Program evaluation plays a significant role in aiding our understanding of the effectiveness of 
interventions. Evaluations operate under the auspices of evaluation policies shape aspects of evalua- 
tion design including research questions, data collection and analysis procedures, and reporting of 
findings. This study suggests that evaluation policies affect how evaluators and foundation staff do 
their work, but that this effect differs across programs and settings. Developing evaluation policies 
as a system might allow for both appropriate guidance and sufficient flexibility in their implemen- 
tation. Designing organizational evaluation policies that are internally integrated and grounded in 
their intended outcomes could have great potential to increase the usefulness of evaluation work in 
these settings. 
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Notes 


1. At the time of this study, Robert Wood Johnson Foundation was undergoing significant organizational 
changes, some of which reflect the Foundation’s commitment to evaluation and learning. R&E was renamed 
Research-Evaluation-Learning (REL); the role of “R&E Officer” was renamed “REL Liaison;” and fund- 
ing (REL’s operating budget) was restructured to enhance its capabilities. The description here pertains to 
the organization and operations in place at the beginning of data collection processes in spring of 2013. 

2. All names and some identifying details of the cases have been redacted and/or changed for confidentiality 
purposes. 
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